[WIP] Show SASS diffs in codediff report by jacobhinkle · Pull Request #4264 · NVIDIA/Fuser

jacobhinkle · 2025-04-17T13:59:03Z

This is almost working but there are problems with the javascript that need to be debugged.

github-actions · 2025-04-17T13:59:57Z

Description

Added support for dumping SASS to file
Updated compiled_kernel.cpp to handle SASS dumping
Modified codediff.py to include SASS in parsing and diffing
Enhanced compare_codegen.sh and run_command.sh to handle SASS files

Changes walkthrough 📝

Relevant files

Enhancement

9 files

options.cpp Added `SassToFile` option	+1/-0
compiled_kernel.cpp `Added SASS dumping to compiled kernel`	+7/-0
codediff.py `Included SASS in kernel parsing and diffing`	+101/-15
compare_codegen.sh `Updated to handle SASS files`	+3/-3
run_command.sh `Updated to handle SASS files`	+8/-6
test_diff_section.html `Added SASS buttons and sections`	+13/-2
inline_javascript.js `Updated functions to handle SASS`	+243/-191
options.h Added `SassToFile` option	+3/-2
executor_utils.h Added SASS fields to `CudaExecutable`	+2/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Possible Issue

The disassembleBinary function is called with a fixed argument "-fun 1 -c". This might not be the correct or optimal argument for all cases. It should be validated that this argument works as expected and consider making it configurable.

std::string sass_str =
    disassembleBinary(compiled_kernel->cubin, "-fun 1 -c");

Code Duplication

The toggleOldCode, toggleNewCode, and toggleDiff functions in inline_javascript.js have a lot of repeated code. This can be refactored to reduce duplication and improve maintainability.

                ["git", "branch", "--quiet", "--color=never", self.full_hash],
                capture_output=True,
            )
            .stdout.strip()
            .decode("utf-8")
            .splitlines()
        ):
            # Possible output:
            #
            #     main
            #     * scalar_seg_edges
            #
            # In this case, we have checked out the HEAD of the
            # scalar_seg_edges branch. Here we just strip the *.
            if line[0] == "*":
                line = line[2:]
                in_branches.append(line)

        def git_show(fmt) -> str:
            return (
                subprocess.run(
                    [
                        "git",
                        "show",
                        "--no-patch",
                        f"--format={fmt}",
                        self.full_hash,
                    ],
                    capture_output=True,
                )
                .stdout.strip()
                .decode("utf-8")
            )

        self.title = git_show("%s")
        self.author_name = git_show("%an")
        self.author_email = git_show("%ae")
        self.author_time = git_show("%ad")
        self.commit_time = git_show("%cd")


@dataclass_json
@dataclass
class LaunchParams:
    blockDim: tuple[int]
    gridDim: tuple[int]
    dynamic_smem_bytes: int


@dataclass_json
@dataclass
class CompiledKernel:
    filename: str
    cuda: str | None = None
    ptx: str | None = None
    sass: str | None = None
    ptxas_info: str | None = None
    launch_params_str: str | None = None
    launch_params: LaunchParams | None = None
    gmem_bytes: int = 0
    smem_bytes: int = 0
    cmem_bank_bytes: list[int] | None = None
    registers: int | None = None
    stack_frame_bytes: int = 0
    spill_store_bytes: int = 0
    spill_load_bytes: int = 0
    mangled_name: str | None = None
    arch: str | None = None
    index_type: str | None = None

    def __post_init__(self):
        self.parse_ptxas()
        self.parse_launch_params()

    def parse_ptxas(self):
        # Example input:
        #
        #   ptxas info    : 307 bytes gmem
        #   ptxas info    : Compiling entry function
        #   '_ZN76_GLOBAL__N__00000000_37___tmp_kernel_pointwise_f0_c1_r0_g0_cu_8995cef2_3255329nvfuser_pointwise_f0_c1_r0_g0ENS_6TensorIfLi2ELi2EEES1_S1_'
        #   for 'sm_86'
        #   ptxas info    : Function properties for
        #   _ZN76_GLOBAL__N__00000000_37___tmp_kernel_pointwise_f0_c1_r0_g0_cu_8995cef2_3255329nvfuser_pointwise_f0_c1_r0_g0ENS_6TensorIfLi2ELi2EEES1_S1_
        #   ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
        #   ptxas info    : Used 203 registers, 16 bytes smem, 472 bytes cmem[0], 8 bytes cmem[2]
        #
        # Here we parse this into the fields presented, and we replace the
        # mangled kernel name since it includes the kernel number and is
        # useless for the purposes of diffing since the kernel signature is
        # already included.
        if self.ptxas_info is None:
            return

        m = re.search(r"Compiling entry function '(.*)' for '(.*)'", self.ptxas_info)
        if m is not None:
            self.mangled_name, self.arch = m.groups()

        def find_unique_int(pattern) -> int | None:
            assert self.ptxas_info is not None
            m = re.search(pattern, self.ptxas_info)
            return 0 if m is None else int(m.groups()[0])

        self.stack_frame_bytes = find_unique_int(r"(\d+) bytes stack frame")
        self.spill_store_bytes = find_unique_int(r"(\d+) bytes spill stores")
        self.spill_load_bytes = find_unique_int(r"(\d+) bytes spill loads")
        self.registers = find_unique_int(r"(\d+) registers")
        self.gmem_bytes = find_unique_int(r"(\d+) bytes gmem")
        self.smem_bytes = find_unique_int(r"(\d+) bytes smem")

        self.cmem_bank_bytes = []
        cmem_banks = 0
        for m in re.finditer(r"(\d+) bytes cmem\[(\d+)\]", self.ptxas_info):
            nbytes_str, bank_str = m.groups()
            bank = int(bank_str)
            if len(self.cmem_bank_bytes) <= bank:
                self.cmem_bank_bytes += [0] * (bank + 1 - len(self.cmem_bank_bytes))
            self.cmem_bank_bytes[bank] = int(nbytes_str)
            cmem_banks += 1

    def parse_launch_params(self):
        # If NVFUSER_DUMP=launch_param is given we will get a line like this for every launch:
        #   Launch Parameters: BlockDim.x = 32, BlockDim.y = 2, BlockDim.z = 2, GridDim.x = 8, GridDim.y = 8, GridDim.z = -1, Smem Size = 49152
        # This is not done by default since we might have hundreds of thousands of these lines.
        # Still, if we recognize it, we will parse this info. If there are
        # multiple lines, we just check that they are all equal and if not then
        # we keep the first version and print a warning.
        if self.launch_params_str is None:
            return

        for line in self.launch_params_str.splitlines():

Potential Error

The sanitize_sass_lines function uses a complex regex to demangle kernel names. This regex might not cover all possible cases and could fail for certain kernel names. It should be tested thoroughly to ensure it handles all expected inputs.

"""Remove comments and remove kernel id"""
sanitary_lines = []
for l in lines:
    # Replace mangled kernel names like
    #   _ZN76_GLOBAL__N__00000000_37___tmp_kernel_pointwise_f0_c1_r0_g0_cu_8995cef2_3255329nvfuser_pointwise_f0_c1_r0_g0ENS_6TensorIfLi2ELi2EEES1_S1_
    # or
    #   _ZN76_GLOBAL__N__00000000_37___tmp_kernel_4_cu_8995cef2_3255329nvfuser_4ENS_6TensorIfLi2ELi2EEES1_S1_
    # or
    #   _ZN57_GLOBAL__N__00000000_18___tmp_nvfuser_5_cu_badbb5a6_975149nvfuser_5ENS_6TensorINS_6__halfELi3ELi3EEES2_NS_9TensorMapES3_NS0_IS1_Li2ELi2EEE,(.L_x_28 - _ZN11kernelscope6kernelENS_6TensorINS_6__halfELi3ELi3EEES2_NS_9TensorMapES3_NS0_IS1_Li2ELi2EEE)

    # with
    #   _ZN11kernelscope6kernelENS_6TensorIfLi2ELi2EEES1_S1_

    # demangle first two parts after _ZN and replace with "kernelscope" and "kernel"
    m = re.match(r"^(?P<prefix>^.*\b_Z?ZN)(?P<scopenamelen>\d+)_", l)
    if m is not None:
        d = m.groupdict()
        scopenamelen = int(d["scopenamelen"])
        # demangle second part in remainder after scope name
        remainder = l[(len(d["prefix"]) + len(d["scopenamelen"]) + scopenamelen) :]
        mrem = re.match(r"^(?P<varnamelen>\d+)", remainder)
        if mrem is not None:
            drem = mrem.groupdict()
            varnamelen = int(drem["varnamelen"])
            remainder = (
                "6kernel" + remainder[len(drem["varnamelen"]) + varnamelen :]
            )
        l = d["prefix"] + "11kernelscope" + remainder

jacobhinkle added 7 commits April 17, 2025 07:52

Add SassToFile option

8d6aab6

Dump SASS to file exactly how PTX is dumped

b60fcdd

Fix compile error

a750a5c

Update codediff scripts to enable sass generation

6c3c9f7

Accomodate sass in parsing and diff (not HTML yet)

c881d09

Sanitize SASS to remove addresses

aabd46d

Enable SASS in html report (broken)

be9c1d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Show SASS diffs in codediff report#4264

[WIP] Show SASS diffs in codediff report#4264
jacobhinkle wants to merge 7 commits intomainfrom
jh/codediff_sass

jacobhinkle commented Apr 17, 2025

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jacobhinkle commented Apr 17, 2025

Uh oh!

github-actions bot commented Apr 17, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant